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ABSTRACT 

Sample coordination, where similar instances have similar samples, was 
proposed by statisticians four decades ago as a way to maximize overlap in 
repeated surveys. Applications of coordinated sampling expanded to speed 
up approximate processing of queries, such as similarity and distinct counts, 
over massive data sets. Estimators, however, were derived in an ad-hoc 
manner and for some basic queries, including Euclidean distance, no good 
estimators were known. 

The usefulness of a sampling scheme hinges on the scope and accuracy 
within which queries posed over the original data can be answered from the 
sample. We aim here to gain a fundamental understanding of the limits and 
potential of coordination. Our main result is a precise characterization, in 
terms of simple properties of the estimated function, of queries for which 
estimators with desirable properties exist. We consider unbiasedness, non- 
negativity, finite variance, and bounded estimates. 

Since generally a single estimator can not be optimal (minimize variance 
simultaneously) for all data, we propose variance competitiveness, which 
means that variance on any data is not too far from the minimum one pos- 
sible for the data. Surprisingly perhaps, we show how to construct, for 
any function for which an unbiased nonnegative estimator exists, a variance 
competitive estimator. Moreover, the derived estimator is bounded when a 
bounded unbiased estimator exists. 

One consequence of our work is obtaining estimators for L p distances 
over coordinated PPS samples which perform well also when only a small 
fraction of the population is sampled. 

1. INTRODUCTION 

Many data sources, including IP or Web traffic logs from dif- 
ferent time periods or locations, measurement data, snapshots of 
data depositories that evolve over time, and document/feature and 
market-basket data, can be viewed as a collection of instances, 
where each instance is an assignment of values from some set V 
to items. 

When the data is too massive to manipulate or even store in full, 
random samples facilitate fast approximate processing of queries 
such as summations, averages, difference queries, and distinct count- 
ing. Samples of different instances are coordinated when the "ran- 
dom bits" used to generate the sample of each instance are shared 
across instances in a way we will soon make precise. Coordination 
is a property of the joint distribution of the samples of different in- 
stances. The sample of one instance does not depend on values in 
other instances but items sampled in one instance are more likely 
to be sampled in others. 

Coordination can be incorporated with familiar per-instance sam- 
pling scheme including Poisson (each item is sampled indepen- 
dently with probability that depends only on its value) and bottom- 
fc (order) sampling, which yields samples of desired fixed size k. It 
is convenient to define these per-instance sampling schemes through 
a rank function, r : [0, 1] X V — ► R, which maps seed-value 
pairs to a number r(u, v) that is non-increasing with u and non- 
decreasing with v. For each item h we draw a seed u(h) £ U[0, 1] 



uniformly at random and compute the rank value r(u(h),v(h)), 
where v(h) is the value of h. With Poisson sampling, item h is 
sampled •<=>• r(u(h),v(h)) > T(h), where T(h) are fixed 
thresholds, whereas a bottom-fc sample includes the k keys with 
highest ranksQ Poisson PPS samples (Probability Proportional to 
Size 1291 , where each item is included with probability proportion 
to its value) are obtained using the rank function r(u, v) = v/u 
and a fixed T(h) across items. Priority (sequential Poisson) sam- 
ples 1 34 22 39 1 are bottom-fc samples utilizing the PPS ranks 
r(u, v) = v/u and successive weighted sampling without replace- 
ment 1 36 23. 10 1 corresponds to bottom-fc samples with the rank 
function r(u, v) = — vj ln(u). 

Coordination between samples of different instances is achieved 
by reusing the same set of random seeds u(h) across instances. 
To facilitate scalable sharing of seeds when instances are dispersed 
in time or location, we can use random hash functions u(h) <— 
H(h), where the only requirement for our purposes is uniformity 
and pairwise independence. Figure [TJcontains an example data set 
of two instances and the PPS sampling probabilities of each item in 
each instance, and illustrates how to coordinate the samples. 

Why coordinate samples? Sample coordination was proposed in 
1972 by Brewer, Early, and Joice [3], as a method to maximize 
overlap in repeated surveys 11381 l35l 1371 . The underlying set of 
weights (amount of traffic on different road segments) may change 
over time, but surveys are expensive, and therefore we want the 
new sample to be both a true PPS sample according to the new 
weights and at the same time, overlap as much as possible with the 
previous sample (in order to minimize overhead of surveying new 
road segments). 

Coordination was subsequently used by Computer Scientists, to 
facilitate efficient processing of large data sets. Coordination en- 
ables efficient sampling over unique items when there are multiple 
occurrences, which is useful in stream processing and distributed 
settings. Coordinated samples of instances are used as synopses of 
which facilitate better estimates of multi-instance functions such as 
distinct counts (cardinality of set unions), cardinality of set inter- 
sections, quantile sums, and difference norms [5 4 81 1191 l33l 1251 
[2^ [2^ [12J [2l [28j [131 [J7] [T8) . Used this way, coordinated sam- 
pling can be casted as a form of Locality Sensitive Hashing (LSH) 
(32|[27|[3]]- 

In (§| we observed that coordinated samples of all reachability 
sets or all neighborhoods in a graph can be obtained in near lin- 
ear time. The main application considered was estimating sizes of 
neighborhoods. Similar approach was applied to estimate sums of 
values stored at nodes in neighborhoods of a (sensor, p2p, gossip) 
network t9l 1331 ITOl 1121 . Once these samples are computed, they 

'The term bottom-fc is due to historic usage of the inverse rank 
function and lowest fc ranks 1 36. 37. 10. 12, 13] 
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PPS sampling probabilities for T=4 (sample of expected size 3): 



where RG P is the exponentiated range RG p (v) — | max(u)— min(i;)| p . 
In our example of Figure [T] L\ of items i G [4] is (1 - 3) 2 + (2 - 
0) 2 + (4 - l) 2 + (1 - 0) 2 = 18, and is computed by summing 
the basic function RG2 (vi ,172) = (vi — V2) 2 over these items. The 
L\ of items {1, 3} is |1 — 3| + |4 — 1| = 5, using the basic func- 
tion RG(wi,«2) = \o\ — Ua|, and the max-sum of items {6, 7,8} 
is max{2, 3} + max{3, 1} + max{l, 0} = 7, which uses the ba- 
sic function max{wi , 172}. Other queries in our examples which 



are not in sum aggregate form can be approximated well by sum 
aggregates: The Jaccard similarity is a ratio of min-sum and max- 
sum, and thus the ratio of min-sum estimate with small additive 
error and a max-sum estimate with small relative error |H7] II 81 is 
a good Jaccard similarity estimator. The L p difference is the pth 
root of Lp, and can be estimated well by taking the pth root of a 
good estimator for L|, which is a sum aggregate of RG P . When 
the domain of query results is nonnegative, as is the case in the ex- 
amples, the common practice, which we follow, is to restrict the 
estimates, which often are plugged in instead of the exact result, to 
be nonnegative as well. 

Sum estimators - one tuple at a time: To estimate a sum ag- 
gregate, we can use an estimator which is the sum of single-tuple 
estimators, estimating the basic function f(v(h)) for each selected 
item h. We refer to such an estimator as a sum estimator. When 
the single-tuple estimators are unbiased, from linearity of expecta- 
tion, so is the sum estimate. When the single-tuple estimators are 
unbiased and sampling of different tuples is pairwise independent 
(respectively, negatively correlated, as with bottom-fc sampling), 
the variance of the sum is (resp., at most) the sum of variances 
of the single-tuple estimators. Therefore, the relative error of the 
sum estimator decreases with the number of selected items we ag- 
gregate over. We emphasize that unbiasedness of the single-tuple 
estimators (together with pairwise independence or negative corre- 
lations between tuples) is critical for good estimates for the sum 
aggregate since a variance component that is due to bias "adds 
up" with aggregation whereas otherwise the relative error "cancels 
out" with aggregation. More concretely, the sample of any partic- 
ular tuple is likely to have all or most entries missing, in which 
case, the variance of any nonnegative unbiased tuple estimator is 
Q.((l/p)f(v) 2 ), where p is the probability that the outcome re- 
veals "enough'[_|on f(v). In such a case, a fixed estimate (which 
is biased) will have variance /(f) 2 , which is lower than the vari- 
ance of any unbiased estimator when p is small, but also clearly 
useless and will result in large error for the sum aggregate. 

A classic sum estimator, which is also unbiased and nonnega- 
tive, is the Horvitz-Thompson (HT) estimator [ 30 1 . When there is 
only one instance, the estimator is applied to one entry at a time, 
and outputs an estimate of f(v)/p when the entry is sampled and 
otherwise (when entry is sampled we know v and therefore can 
deduce the probability p — sup u6( - ji r(u,v) > T(h) that the 
entry is sampled and thus compute the HT estimate.). The HT es- 
timator has minimum variance for each entry and therefore (on a 
single instance) is the sum estimator with minimum variance for 
all data. The HT estimator is applicable to some multi-instance re- 
lations 1 17 18 1 but may not be optimal even when it is applicable. 

The case for sum estimators: We argue that we have much to gain 
and little to loose by restricting our attention to sum estimators. Be- 
yond simplicity, an important practical advantage of sum estimators 
is that their properties are not sensitive to the degree of "indepen- 
dence" we have between the sampling of different tuples. This is 
important because higher degrees of independence are more diffi- 
cult to achieve. Unbiasedness of the sum estimate holds even when 
tuple samples are dependent and the variance relation which im- 
plies that the relative error decreases with aggregation requires only 
pairwise independence. In contrast, the performance of any non- 
sum estimator depends on the joint distribution of multiple (in the 

2 More precisely, let p be the probability that the outcome provides 
a lower bound of at least f(v) /2. To be nonnegative, the contribu- 
tion to the expectation from the other (1 — p) portion of outcomes 
can not excede f(v)/2, so the remaining contribution of the p por- 
tion of outcomes must be at least f(v)/2, giving the lower bound 
on the variance. 



Instance 1: 
Instance2: 



0.25 0.00 1.00 0.25 0.00 0.50 0.75 0.25 
0.75 0.50 0.25 0.00 0.50 0.75 0.25 0.00 



Figure 1: Data for two instances with 8 items and respective 
PPS sampling probabilities for threshold value 4, so item with 
value v is sampled with probability min{ 1 , w/4}. To obtain two 
coordinated PPS samples of the instances, we associate an inde- 
pendent u(i) G U[0, 1] with each item i G [8]. We then sample 
i G [8] in instance h G [2] if and only if u(i) < v/4, where v is 
the value of i in instance h. When coordinating the samples this 
way, we make them as similar as possible for two PPS samples 
for different sets of values. In this particular case, item 1 will 
always (for any drawing of seeds) be sampled in instance 2 if it 
is sampled in instance 1 and vice versa for item 7. 



can be used for other queries such as distinct counts and similarity 
between pairs of neighborhoods which are useful in the analysis of 
massive graph datasets such as social networks or Web graphs. 

Broder |5]|4) used coordinated samples of features in documents 
to estimate Jaccard similarity and quickly identify similar docu- 
ments . Gibbons |26 | used coordination for distinct counts and 
Gibbons and Tirthapura [25] looked at Li and union estimates. In 
1111 [131 1171 1181 we provided generic constructions of estimators 
for basic multi-instance functions, including quantiles sums and L\ 
difference. 

To summarize, coordination is (i) purposely performed to ob- 
tain samples or sample-based synopsis which support estimates of 
multi-instance relations (ii) used to obtain more similar samples 
when values change, as a way to reduce overhead associated with 
changing the sample (iii) sometimes much more efficient to com- 
pute than say independent samples. 

Our aim here is to study the estimation of multi-instance rela- 
tions over coordinated samples. The same set of samples can be 
used to perform multiple queries, including queries that may not be 
even specified when the sampling is performed. In particular, the 
sample we have for each instance can be used to estimate subset 
statistics over the instance, such as sums, averages, and moments. 
While we focus here on queries that span multiple instances, we 
keep this in mind, and therefore do not aim for a sampling scheme 
optimized for a particular query (although some of our results can 
be applied this way), but rather, to understand the potential and lim- 
its of each scheme with respect to different queries. In particular, 
we aim to optimize the estimator given the sampling scheme and 
query. 

Sum aggregates: Most queries in the above examples can be casted 
as sum aggregates over selected items h of a basic function f(v) 
applied to the item weight tuple in different instances v(h) = 
(vi(h), V2Q1), ■ ■ ■)■ In particular, distinct count (set union) is a 
sum aggregate of OR(v(h)), max-sum aggregates max(ti(ft)) = 
maxiUi(/i), min-sum aggregates min(v(h)) — miniVi(h), and 
Lp (pth power of L p -difference) is the sum aggregate of RG p (v(h)), 
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worst case all) tuples. Lastly, sum estimators over independently 
sampled tuples are variance optimal in a Pareto sense [ 14 1: While 
alternative estimators that have lower variance on some data may 
exist, there are no estimators that strictly dominate sum estimators. 
The HT estimator on a single instance dominates all other estima- 
tors on sparse data, where most of the weight of the sum is concen- 
trated on fewer entries, lastly, a technical requirement for applying 
sum estimators is that the sampling scheme of each tuple is explicit. 
This naturally happens with Poisson sampling where an entry (item 
h in instance i) is sampled <=>■ r(u(h),v(h)) > Ti(h). With 
bottom-fc sampling, however, item inclusions are dependent and 
therefore the separation a bit more technical, and relies on a tech- 
nique named rank conditioning [12 T|0 

To summarize, we argued that estimation of sum aggregates, 
over Poisson or bottom-fc samples, "reduces" to estimating single- 
tuple functions f(v) > where each entry of v is Poisson sampled 
and we want these single-tuple estimators to be unbiased and non- 
negative. From here on, we focus our attention on unbiased and 
nonnegative estimators for f(v). 

The challenge we address: Prior work, mostly based on adapta- 
tions of the HT estimator for multiple instances, lacked fundamen- 
tal understanding of when unbiased and nonnegative estimators ex- 
ist. The HT estimator is applicable provided that for any v where 
f(v) > 0, there is positive probability for an outcome that both 
reveals f(v) and allows us to determine a probability p for such an 
outcome. These conditions are satisfied by some basic functions 
including max(u) and min(«). There are functions, however, for 
which it is not possible to obtain an HT estimator, but nonetheless, 
for which nonnegative and unbiased estimators exist. Moreover, 
the HT estimator may not be optimal even when defined. 

As a particular example, the only L v difference for which a "sat- 
isfactory" estimator was known was the L\ difference 1171 1181 . 
Stated in terms of single-tuple estimators, prior to our work, there 
was no unbiased and nonnegative estimator known for RG p (u) = 
| max(u) — min(v)| p for any p / 1. For p = 1, the known 
estimator used the relation RG(v) = max(v) — min(v), sepa- 
rately estimating the maximum and the minimum and showing that 
when samples are coordinated then the estimate for the maximum 
is always at least as large as the one for the minimum and there- 
fore the difference of the estimates is never negative. But even 
for this RG(u) estimator, there was no understanding whether it is 
"optimal" and more so, what optimality even means in this con- 
text. Moreover, the ad hoc construction of this estimator does not 
extend even to slight variations, like max{fi — v 2, 0}, which sum- 
aggregates to the natural "one sided" L\ difference. 

Overview of contributions 

Throughout the 40 year period in which coordination was used, 
estimators were developed in an ad-hoc manner, lacking a funda- 
mental understanding of the potential and limits of the approach. 
We highlight our main contributions towards filling these gaps. 

Characterization: We provide a complete characterization, in terms 
of simple properties of the function / and the sampling parameters, 

3 Rank conditioning was applied to estimate subset sums over 
bottom-fc samples of a single instance, and allows us to treat each 
entry as Poisson sampled. In rough details, for each entry, we ob- 
tain a "Ti(h) substitute" which allows the inclusion of h in the 
sample of instance i on seeds (and thus ranks) of all other items 
being fixed. With this conditioning, the inclusion of h in the 
sample of i is Poisson with threshold T, that is, follows the rule 
r(u(h) , v(h)) > Ti, where T is equal to the fcth largest rank value 
of items in instance i with h excluded which is the same as the 
(k + l)st largest rank value. 



of when estimators with the following combinations of properties 
exist for /: 

• unbiasedness and nonnegativity, 

• unbiased nonnegativity and finite variances, which means that for 
all v, the variance given data v is finite. 

• unbiasedness nonnegativity and bounded estimates, which means 
that for each v, there is an upper bound on all estimates that can 
be obtained when the data is v. Bounded estimates implies finite 
variances, but not vice versa. 

The J estimator Our characterization utilizes a construction of an 
estimator, which we call the J estimator, which we show has the 
following properties: The J estimator is unbiased and nonnegative 
if and only if an unbiased nonnegative estimator exists for /. The J 
estimator has a finite variance for data v or is bounded for data v if 
and only if an (nonnegative unbiased) estimator with the respective 
property for v exists. 

Variance competitiveness: We mentioned that when there is a sin- 
gle instance, the HT estimator is optimal. More precisely stated, 
under the requirement that for all valid data, the estimates are non- 
negative and unbiased, and that we use a sum estimator, the HT 
estimates have minimum variance for all data. 

With multiple instances, however, there may not be a single es- 
timator with minimum variance on all data vectors. We are there- 
fore aiming for a notion of variance competitiveness, which means 
that for any data vector, the variance of our estimator is not "too 
far" from the minimum variance possible for that vector by a non- 
negative unbiased estimator. More precisely, an estimator is c- 
competitive if for all data v, the expectation of its square is within 
a factor of c from the minimum possible for v by an estimator that 
is unbiased and nonnegative on all data. 

u-optimality: To study competitiveness, we need to compare the 
variance on each data vector to the minimum possible, and to do 
so, we need to be able to express the "best possible" estimates. We 
say that an estimator is v -optimal if amongst all estimators that are 
unbiased and nonnegative on all data, it minimizes variance for the 
data v. We express the i>optimal estimates, which are the values a 
v-optimal estimator assumes on outcomes that are consistent with 
data v, and the respective v-optimal variance. We show that the v- 
optimal estimates are uniquely defined (almost everywhere). The 
u-optimal estimates, however, are inconsistent for different data 
since as we mentioned earlier, it is not generally possible to obtain 
a single unbiased nonnegative estimator that minimizes variance for 
all data vectors. They do allow us, however, to analyse the compet- 
itiveness of estimators, and in particular, that of the J estimator. 

Competitiveness of the J estimator: We show that the J estimator 
is competitive. In particular, this shows the powerful and perhaps 
surprising result that whenever for any particular data vector v there 
exists an estimator with finite variance that is nonnegative and un- 
biased on all data, then there is a single estimator, that for all data, 
has variance that is close to the minimum possible. 

Practical implications: We demonstrate some of the practical sig- 
nificance of our work in [ 16], where we derive and apply L p differ- 
ence estimators for the exponentiated range functions RG P (p > 0), 
and experimentally study the performance of the L\ and L2 esti- 
mators over PPS (and priority) samples of various data sets. 

The study demonstrates accurate estimates even when a small 
fraction of the data set is sampled. To the best of our knowledge, 
prior to our work, there was no good estimator for L p differences 
over coordinated samples for any p / 1 and only a weaker esti- 
mator was known for p = 1 1171 . The competitive ratios of the 
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estimators studied in 1161 are 2 for the L\ estimator and 2.5 for the 
1/2 estimator. 

Related work: The scope of our work is to understand what we 
can get from samples but we mention that there are other meth- 
ods to summarize data that support approximate queries. The LSH 
framework uses sketches of instances to approximate queries such 
as Li difference over data streams [24], L p difference over vectors 
1211 . and other similarity metrics [7 1. Sketches (that are not sam- 
ples) can provide more accurate results to the specific queries they 
are designed to answer but unlike samples, even a simple modi- 
fication of the query such as truncating the contribution of large 
values or confining the query to a selected subset of the items typ- 
ically can not be efficiently supported in retrospect. Examples of 
natural subset queries are daily difference in IP flow volumes over 
flows originating from a certain AS" or difference between users in 
California and Texas of number of downloads of different election- 
related you-tube videos. 

We recently studied estimation over independent samples of in- 
stances 1141 IT31 . where the main application was set union and 
quantile sum estimates. In a sense, coordinated and independent 
sampling are the two interesting extremes of the joint distribution. 
Our current work treats this other extreme, building on the ap- 
proach we initiated in 1141 1151 . The simpler structure of coordi- 
nated sampling enables us to gain a precise and more complete 
understanding, which we hope can guide us to better answers for 
independent samples. 

2. COORDINATED SAMPLING MODEL 

The data is a vector v — (v\,V2, ■ ■ ■ , v r ) G V = V r , where 
V C R> . Using the terminology in the introduction, we are now 
looking at a single item (single-tuple v) and the value Vi of the ith 
entry is the value of the item in instance i. The data is sampled 
through a sampling scheme specified by non-decreasing continu- 
ous functions t = (n,... ,T r ) on [0, 1] with range containing 
(min V, max V). The outcome S = S(u,v) is the output of the 
sampling scheme and is a function of a random seed it G U[0, 1] 
and the data v. We treat the outcome as a set where the ith entry is 
included in S if and only if Vi is at least Ti(u): 

i G S <=> Vi > n(u) . 

Sampling is PPS if Ti(u) are linear functions: there is a fixed 
vector r* such that Ti(u) = itr*, in which case entry i is included 
with probability min{l, Wj/r*}. Our use of the term PPS refers to 
sampling of each instance i using threshold r*, and the "projected" 
sampling scheme on the ith entry of our tuple. 

Observe that our model assumes weighted sampling, where the 
probability that an entry is sampled depends (and is non-decreasing) 
with its value. Transiting briefly back to sampling of instances, 
weighted sampling results in more accurate estimation of quantities 
(such as averages of sums) where larger values contribute more. It 
is also important for boolean domains (V = {0, 1}) when most 
items have values and in this case, enables us to sample only 
"active" items. 

We assume that the seed it and the functions r are available to 
the estimator, and in particular, treat the seed as provided with the 
outcome. When an entry is sampled, we know its value and also 
can compute the probability that it is sampled. When an entry is not 
sampled, we know that its value is at most Ti(u) and we can com- 
pute this upper bound from the seed u and the function n . Putting 
this information together, for each outcome S(u, v), we can define 
the set V* (S) of all data vectors consistent with the outcome. This 
set captures all the information we can glean from the sample on 



Figure 2: Illustration of the containment order on all possible 
outcomes V*(S). Example has data vectors V — {0, 1, 2} x 
{0, 1, 2} and seed mappings t\ = T2 = t. The root of the tree 
corresponds to outcomes with it G (r _1 (2), 1]. In this case, the 
outcome reveals no information on the data and V*(S) con- 
tains all vectors in V. When it G (r _1 (l), r _1 (2)] the out- 
come identifies entries in the data that are equal to "2". When 
it G (0,r -1 (l)], the outcome reveals the data vector. 

the data. 

V*(S) = V*(u, v) = {z | Vi e [r],i G SAzi = Vi Vi SAz t < n(u)} . 



Structure of the set of outcomes. From the outcome, which is 
the set of sampled entries and seed p, we can determine V* (it, v) 
also for all it > p. We also have that for all u > p and z G 
V*(p,v ), V*(u, z) — V*(u,v). Fixing v, the sets V*(u, v) are 
non-decreasing with u and the set S of sampled entries is non- 
increasing, meaning that V*(u, v) C V*(p.v) and S(u, v) D 
S(p, v) when u < p. 

The containment order of the sets V*(S) is a tree-like partial 
order on outcomes. For two outcomes, the sets V*(S) are either 
disjoint, and unrelated in the containment order, or one is fully con- 
tained in another, and succeeds it in the containment order. The 
outcome S(u, v) precedes S(p, v) in the containment order if and 
only if it > p. When V is finite, the containment order is a tree 
order, as shown in Figure [2] 

When V is an interval of the nonnegative reals, then for z and v, 
the set of all u such that z G S(u, v), if not empty, is a suffix of 
(0, 1] that is open to the left. 

Lemma 2.1. 

Vp G (0, 1] Vv 

z G V*{p,v) => 3e > 0, Vz G (p-e,l], z G V*(x,v) 

PROOF. Correctness for all x G [p, 1] follows from the structure 
of the set of outcomes: Since V * (x, v) D V* (p, v) for all x > p 
then z G V*(p,v) => zeV*{x,v). 

Consider now the set 5* of entries that satisfy Vi > Ti(p). Since 
z G V*(p, v), we have Vi G S,Zi = Vi andVi ^ S, ma,x{zi,Vi} < 
Ti(p). Since r» is continuous and monotone, for all i S, there 
must be an > such that Ti(p — €i) > max{zi, Wi}. We now 
take e = min^g to conclude the proof. □ 

3. ESTIMATORS AND PROPERTIES 

Let / : V be a function mapping V to the nonnegative reals. An 
estimator f of / is a numeric function applied to the outcome. We 
use the notation f(u,v) = f(S(u,v)). On continuous domains, 
an estimator must be (Lebesgue) integrable. An estimator is fully 
specified for v if specified on a set of outcomes that have probabil- 
ity 1 given data v. Two estimators /i and /2 are equivalent if for 
all data v, fi(u, v) — feiu, v) with probability 1. 

An estimator / is nonnegative if VS*, f(S) > and is unbiased 
if Vv, E[/|i>] = f(v). An estimator has finite variance on v if 
f f(u,v) 2 du < oo (the expectation of the square is finite) and 
is bounded on v if sup u6( - ^ /(it, v) < oo. If a nonnegative es- 
timator is bounded on v, it also has finite variance for v. We say 
that an estimator is bounded or has finite variances if the respective 
property holds for all v G V. 
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v-optimality. We say that an unbiased and nonnegative estimator 
is v-optimal, that is, optimal with respect to a data vector v, if it 
has minimum variance for v. We refer to the estimates that a v- 
optimal estimator assumes on outcomes consistent on data v as the 
v-optimal estimates and to the minimum variance attainable for v 
as the v-optimal variance. 

Variance competitiveness. An estimator / is c-competitive if 
Vw, J (f(u,v)\ du < cinf J ^f'(u,v)\ du, 

where the infimum is over all unbiased nonnegative estimators /' 
of /. For any unbiased estimator, the expectation of the square is 
closely related to the variance: 

VAR[f\v]= [ {f(u,v)~ f(v)fdu= [ f(u,v) 2 du- f(v) 2 
Jo Jo 

(1) 

When minimizing the expectation of the square, we also minimize 
the variance. Moreover, c-competitiveness means that 

Vw, VAR[/|v] < cinf VAR[/'|w] + (c- \)f{vf (2) 
/' 

for all data vectors v for which a nonnegative unbiased estimator 
with finite variance on v exists, the variance of the estimator is at 
most c times the u-optimal variance plus an additive term of (c— 1) 
times f(v) 2 . 

An important remark is due here. Firstly, as discussed in the in- 
troduction, in the typical scenario when the sample is likely to pro- 
vide little or no information on f(v), the variance is fl(f(v) 2 ), and 
hence competitive ratio as we defined it in terms of the expectation 
of the square also translates to competitive ratio of the variance. 
Otherwise, when the domain can include data on which samples 
are likely to reveal the value, it is not possible to obtain a universal 
competitiveness result in terms of variance. One bad example is RG 
on PPS samples. Interestingly, the bad examples are query specific 
- for RG2 it is possible to get a bounded ratio in terms of variance. 
More details are in the companion experimental paper 1161 . 

4. THE LOWER BOUND FUNCTION AND 
ITS LOWER HULL 

For a function /, we define the respective lower bound function 
f and the lower hull function Hf. We then characterize, in terms 
of properties of / and Hf when nonnegative unbiased estimators 
exists for / and when such estimators exist that also have finite 
variances or are bounded. 

The lower bound function f(S): For Z C V, we define f(Z) = 

inf {/(«) | v £ Z} to be the tightest lower bound on the values 
of / on Z. We use the notation /(S) = f_(V*(S)), f(fi,v) = 
f(V*(p, v)). When v is fixed, we use f( v > (u) = f(u, v). 

From Lemma l2~Tl we obtain that Vw, /''"' (u) is left-continuous, 
that is: 

Corollary 4.1. VuVp e (0, 1], lim v ^ p - f-^iv) = 
LEMMA 4.2. A nonnegative unbiased estimator f must satisfy 



(3) 



V«,Vp, / f{u,v)du<£ v '{p) 
J p 

PROOF. Unbiased and nonnegative / must satisfy 

Vt>,Vpe (0,1]. / f(u,v)du< [ f(u,v)du = /(«) . (4) 
J p Jo 



From definition of /, for all e > and p, there is a vector € 
S(p, v) such that < f(p, v) + e. Recall that for all u> p, 

S(u, v) = S(u, z^), hence, using, I0, 



f(u,v)du= I f(u,z ie) )du<f(z w )<f(p,v) + e 



Taking the limit as e — > we obtain f(u, v)du < f(p, v) . □ 

The lower hull of the lower bound function and v-optimality: 

We denote the function corresponding to the lower boundary of 
the convex hull (lower hull) of by Hf . Our interest in the 
lower hull is due to the following relation (The proof is postponed 
to Section[7}: 

THEOREM 4.1. An estimator f is v-optimal if and only if for 
u £ [0,1] almost everywhere 



dH { f v \ 
du 



f(u,v) = 



Moreover, when an unbiased and nonnegative estimator exists for 
f, there also exists, for any data v, a nonnegative and unbiased 
v-optimal estimator. 

We use the notation f y >{u) = *j- — for the v-optimal esti- 
mates on outcomes consistent with v. Since the lower bound func- 
tion is monotone non-increasing, so is Hf, and therefore H^ is 

differentiable almost everywhere and is defined almost every- 
where. Figure [3] illustrates an example lower bound function and 
the corresponding lower hull. 



estimate (cummulative >u) 
lower bound function 




Figure 3: Lower bound function (u) for u £ (0, 1] and 

corresponding lower hull Hy'(u), which is also the integral 
of the nonnegative estimator with minimum variance on v: 
fu f^{x)dx. The figure visualizes the lower bound function 
which is always left-continuous and monotone non increasing. 
The lower hull is continuous and also monotone non-increasing. 
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5. CHARACTERIZATION 

THEOREM 5.1. / > has an estimator that is 

• unbiased and nonnegative •<=>• 

Vw £ V, lim f (v) (u) = f(v) . 

u->0+ 

• unbiased, nonnegative, and finite variances <=> 



Vv £ V, 



1 /dH < f ) (u)\ 2 



du 



du < oo . 



• unbiased, nonnegative, and bounded 



\/v £ V, lim = < oo 

u->0+ u 



(5) 



(6) 



(7) 



We establish sufficiency in Theorem l5.1l bv constructing an esti- 
mator Z*-" 7 ' (the J estimator) that is unbiased and nonnegative when 
l[5} holds, bounded when (0 holds, and has finite variances if l[6ll 
holds. The proof of the theorem is provided in Section |6?2] follow- 
ing the presentation of the J estimator in the next section. 

6. THE J ESTIMATOR 

Fixing v, we define an estimator J - ) (u, v) incrementally, start- 
ing with u = 1 and such that the value at u depends on val- 
ues at u > u. We first define f^ J \u, v) for all u £ (i, 1] by 
/' J ' (u, v) = 2/(1, v). At each step we consider intervals of the 
form (2 -J '~ 1 , 2~ J ], setting the estimate to the same value for all 
outcomes S(u, v) for u £ (2~ 3 ~ , 2~ ] \, Assuming the estimator 
is defined for it > 2~ 3 , we extend the definition to the interval 
u £ <2~ j - 1 ,2- j ] as follows. 



f (J) {u,v) = 0, if f{2' J ,v) = / f {J, (u,v)du 



mi 



f (J) (u,v) = 2 J+1 ( /(2~ J ,w) - / f {J, (u,v)du ) , otherwise 



Cummulative J estimates 
Lower bound function 




1/8 

1/64 1/32 1/16 



Figure 4: Lower bound function p v '{u) for u £ (0,1] 
and cummulative J estimates on outcomes consistent with v 
J / (J) (x, v)dx. The J estimate / (J) (w, w) is the negated slope. 



LEMMA 6.1. The J estimator is well defined, is unbiased and 
nonnegative when (O holds, and satisfies 



(8) 



VpVv, / f (J) {u,v)du < l(p,v) 
VpVv, f f J \u,v)du>l(4p,v) 

J P 



(9) 



PROOF. We first argue that the constructions, which are pre- 
sented relative to a particular choices of the data v, produce a con- 
sistent estimator. For that, we have to show that for every out- 
come S(p,v), the assigned value is the same for all vectors z £ 
V*(p, v). Since f(p, v) = f(p, z) for all z £ V*(p, v), in partic- 
ular this holds for p = 2~ J , so the setting of the estimator for u £ 
[2- j - 1 ,2- j ] is the same for all S(u, z) where z £ V*(2~\ v) 
(also when z g V* (2~-'~ 1 , v)). Therefore, the resulting estimator 
is consistently defined. 

We show that the construction maintains the following invariant 
for j > 1: 



f(2-> +1 ,v)= I r\u,v)du 



{J) 1 



(10) 



From the first step of the construction, 



f J) (u,v)du= 2/(l,«)du = /(l,»). 

1/2 Jl/2 

So j 101 > holds for j = 1. Now we assume by induction that d 10b 
holds for j and establish that it holds for j + 1. If f(2~ J , v) — 

f (2 _J :+ 1 , v), then by the definition of the J estimator J 2 2 _j_i f^ J \u,v)du ■ 
and we get 

1 pi 

f (J) (u,v)du= / f<- J) (u,v)du = l(2- j+1 ,v) =l(2- j ,v). 



Otherwise, by definition, / 2 _ 3 _i /' (u,v)du = /(2 — 
/(2- J+1 ,u) and hence / (J) (u,v)du = l(2' j ,v) and 

TTtJl holds for 7 + I. 

From monotonicity, /(2 _ < f(2~ J ,v) and when sub- 

stituting d 1 Ob in the definition of the estimator we obtain that the 
estimates are always nonnegative. 

To establish © we use GIB, the relation 2 [log2 pJ < p < 2 1+ [log2 pJ , 
and monotonicity of f(u, v), to obtain 

1 r i 



f {J >(u,v)du< f J> (u,v)du = 

J 2 L'°K2 PJ 

l(2 1+ll ° g2P \v)<l(p,v). 

Similarly, we establish (|9) using {TO}, and the relation 2 _1+ r ' og2 pl < 
p<2 riog2Pl : 

/ f {J) (u,v)du> [ f (J) (u,v)du 

= l(2 1+ ^ p \v)>l(4p,v) 
Lastly, unbiasedness follows from ((5} and combining ([8j and l|9}: 

f(p,v) > f f {J \u,v)du>£(4p,v). 
J p 

when we take the limit as p — > 0. □ 

Computing the J estimate from an outcome S: From the out- 
come we know the seed value p and the lower bound function 
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f^ v \u) for all u > p (recall that the lower bound on this range 
is the same for all data v € V* (S), so we do not need to know the 
data v). We compute i <— [— log 2 pj and use the invariant dlOt 
in the definition of J, obtaining the J estimate 2 l+1 (/(2 _I , v) — 

Example: We demonstrate the application of the J estimator through 
a simple example. The data domain in our example includes pairs 
(v\,V2) of nonnegative reals. We are interested in f(vi,V2) = 
(maxjui — U2,0}) 2 , which sum aggregates to "one sided" Eu- 
clidean distance. The data is PPS sampled with threshold r — 1 
for both entries, therefore, the sampling probability of entry i is 
min{l, Vi}. Sampling is coordinated, which means that for a seed 
p G U[0, 1], entry i is sampled if and only if v, > p. The outcome 
S includes the values of the sampled entries and the seed value p. If 
no entry is sampled, or only the second entry is sampled, the lower 
bound function for x > p is and the J estimate is 0. If only the 
first entry is sampled, the lower bound function, for x > p, and the 
J estimate for the outcome are 



below. 



_p ' (a;) = max{0, vi — x} 2 



max{0, v± — 2 



-L-log 2 ph2 



-max{0,^i _ 2 1 ~ L - log2Pj } 2 

If both entries are sampled, the lower bound function for x > p, 
and accordingly, the J estimate are 

f( v '(x) — max{0, vi — max{v2, x}} 2 

f (J) {S) = 2 L - log2Pj+1 ^max{0,^i - max{« 2 , 2" L " log2 pJ }} 2 



max{0, v\ — max{u2, 2 



,1-L- l°S2 pJ j.j 2 



6.1 Competitiveness of the J estimator 

THEOREM 6.1. The estimator / ( /) is c-competitive (c < 84,) 
PROOF. We need to show 

Wv, £ (^f {J) (u,v)j du < 84 1* (j M (u)) du. 



either 0, if/ 1 («, v)du = /« (p) oris 2 ^ Hp) ~ Mdn 



Let p = 2 3 for some integer j > 0. Recall the construction of 
p ' on an interval (p/2, p]. The value is fixed in the interval and is 

Jp 

Using $9^ in Lemma [6~T1 we obtain that for u G (p/2, p], 

f (J) (u,v) < 2 - Jp 

P 

<2 / W (p)-/ W (4p) 



Thus, 

j' ^ J \u,v)Jdu < | j 2 [p v \p) - / W (4p) 

= ^(/ W (p)-/ W (4p)) 2 (11) 
We now bound the expectation of p v ' on u G (p/2,4p) from 



(13) 



(14) 



/ " f^\u)du = f " f^(u)du+ f f^(u)du 

Jp/2 Jp Jp/2 

> fW(u)du + f^fW(p) d2) 
rip „, s o/ W (p)- r 1 f^Mdu 

= } {v) {u)du+ 
Jp 

=5 ^ / W («)<*« + \ (p v \p) - £ f (v Hu)du^J 

>i(/ W (p)-/ W (4p)) 

Inequality d!2t follows from monotonicity of the function f^ v K 
Inequality J 1 3 b from the definition of as the negated derivative 
of the lower hull of /« /»(p) > ~ ^^--Q^^ (More 
precisely, using the explicit definition of the v-optimal estimates 

m in Section /»(p) = in f„<, <p ^^/^ > 

. , f^{p)-Sl f{u,v)du f^(p)-Q f(u, V )du . 

mf <^<p = ^ = = £ ). Lastly, in- 

equality Q4J uses 

L 1 ! { ^{u)du < / (,,) ( 4 P). which follows from 
nonnegativity of and Lemma l4~2l 

Dividing both side by 3.5p we obtain a lower bound on the av- 
erage value of (u) in the interval [p/2, 4p]. 

i I'll f(v) {u)du - t p {t (v) {p ' v) - ^ (4p) ) (15) 

We next show that the value p J '(u, v) on u G (p/2, p] is at 
most some constant times the expected value of the square of 

on u G (p/2,4p). 

2 du > 



du > 



/■4p 
Jp/2 

Hp / ^ "^ (t)) ^ 

7p/2 \3.5p 7 p /2 

^(^(^w-^^)) 2 **^ (17) 

3 - 5 -°(^) (/ ( " ) (P)-/ (, ' ) (4P)) > (18) 

4 r f-»{ U ,v?du 

*o J p /2 

Inequality dl6t uses the fact that for any random variable X, (E[X]) 2 < 
E[X 2 ] applied to for u G (p/2,4p]. Inequality Q7) follows 
from d!5t . Lastly, Inequality d!8t follows from 1 11 11 . We obtain 



(16) 



1 o° r 2 



P j, {uydu- 



(m) du 



00 /.min{l,2 _i + 2 } 

<28^ / /^(u) 2 dti 

i=0 *' 2 * 1 



JO 



< 28-3 / f (v) (u) 2 du 



□ 
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6.2 Proof of Theorem O 

PROOF. • "=>" From Lemma l4~2l an unbiased and non- 
negative estimator / must satisfy Fixing v in ([3} and taking 
the limit as p — > we obtain that E[/|i>] = f Q f(u,v)du < 

lim u _>o / (u, v). Combining with unbiasedness: E[/|u] = f(v) 
we obtain (5). 

• "<^" {5): Follows immediately from Lemma [6771 

• "=>" Q: We bound from below the contribution to the expecta- 
tion of unbiased and nonnegative / of outcomes S(u, v) for u < p: 
f P f(u,v) = ftf{u,v) - £/(«,») > /(»)-/<">(«)• The 
last inequality follows from unbiasedness and nonnegativity ([3}. 
Hence, the average value /(u, u) when u < p must be at least 

f(v)-f iv) (p) 



and thus, considering all possible values of p > 0, we obtain that / 
can be bounded only if it satisfies 10. 

• ((7): Note that {7} => (|5j, and therefore the conditions 
of Lemma lBTTl are satisfied and the J estimator is well defined, non- 
negative, and unbiased. It remains to show that given (0, or the 
equivalent statement 



Vu 3c < oo Vu, f(v) - / (,,) (u) < cu , 
the J estimator is bounded. Fix v and let c be as in 

/ W (/o/2)- f, 1 f (J) (u,v)du 
f \p,v) < 2 - 2p 



< 2 



/(«) 



P 



16 



/(»)-/W(8p) 



8p 



< 16c 



(19) 



(20) 



(21) 



(22) 



Inequality J20t is from the definition of the J estimator. Inequality 
uses definition of the lower bound function and (|9j. Lastly, 
( 122b follows from our assumption dl9> . 

• " " l[6j: From Theorem l4.ll for all v, (Hj}, which is square- 
integrability of («), is necessary for existence of a nonnegative 
unbiased estimator with finite variance for v. Sufficiency follows 
from the proof of Theorem 16.11 which shows that for all v, the ex- 
pectation of the square of the J estimator is at most a constant times 
the minimum possible, and l[2j, which states that the variance is 
bounded if and only if the expectation of the square is bounded. □ 

7. POINTWISE VARIANCE OPTIMALITY 

We work with partial specification of (nonnegative unbiased) es- 
timators. A partial specification / is defined over a subset S of 
outcomes that is closed under the containment order V*(S): 

Vu 3p v € [0, 1], S(u, v) € Sa.e. for u > p v A 
S(u, v) £ S a.e. for u < p v . 

We also require that / : S > 0, that is, estimates are nonnegative 
when specified, and that 



Vu, p v > 
Vu, p v = 



f(u,v)du < f(v) 



f(u, v)du = f(v) 



(23a) 



(23b) 



If p v = 0, we say that the estimator is fully specified for v. 

We show that a partial specification / can always be extended to 
an unbiased nonnegative estimator: 

LEMMA 7.1. If f satisfies ([5} (has a nonnegative unbiased es- 
timator) and f is partially specified then it can be extended to an 
unbiased nonnegative estimator. 



PROOF. The only if direction follows from Lemma [43] The 
"if" direction follows from a slight modification of the J estima- 
tor construction, in which we "start" to specify the estimator at p v 
instead of 1. □ 

THEOREM 7.1. If f is a partially specified estimator, then an 
extension of f to the outcomes S(p, v) for p £ (0, p„] is a partially 
specified estimator and minimizes VAR[/ju] (overall unbiased and 
nonnegative estimators which are extensions of the partial specifi- 
cation f). 



for u <= (0, p v ] a.e. 



f(u,v) = 



mr = 

0<T}<U 



(24) 



Moreover, a solution f of J24b always exists and H(u) — J 1 f(x, v)dx 
for u £ (0, pv] is unique. 



PROOF. From Lemma 17711 an extended / to outcomes S(r/,v) 
for rj £ (0, p v ] is a partially specified estimator if and only if 



Vu< P v I f(x,v)dx < f (v) (u) 



f(x, v)dx = f(v) 



(25a) 



(25b) 



From the structure of the constraints (1251 , if we take a solution 
f(u, v) and shift "weight" to lower u values (decreasing / on 
higher u values and increasing on lower ones so that the total con- 
tribution is the same) then the result is also a solution. If the shift re- 
sults in decrease of higher values and increase of lower values, then 
the variance decreases. Therefore, a minimum variance solution of 
l |25t must be monotone nonincreasing with u almost everywhere, 
meaning, that monotonicity holds if we exclude from (0, p v ] a set 
of points with zero measure. 

Bzero measure U° C (0, p v ]iu\,U2 £ (0, p v ] \ Uo, 

ui < u 2 f(ui,v)>f(u2,v) (25c) 

Looking now only at extensions satisfying l |25at , l |25b| l, and l !25c| >, 
we can decrease variance if we could "shift weight" from the higher 
estimates on lower u values to the lower estimates on higher u val- 
ues. Such shifts, however, may violate d25at . Therefore, an estima- 
tor has minimum variance only if all such shifts result in violation, 
that is, after excluding a zero measure set, between any two points 
with strictly different estimate values there must be a point r\ such 
that /(it, v)du "touches" the lower bound at rj. Formally, 

3 zero measure U° C (0, p^JVui, U2 £ (0, p v ] \ Uq 

Ui < U2 A /(«!,«) > f(u 2 ,v) => 

3ne{u 1 ,u 2 } i [ f(u,v)du= ]imf iv) (u). (25d) 

J v u->!7 + 

We established that ( 1251 ) (l |25a| l- l |25dt ) are necessary for an ex- 
tension to have minimum variance. We also argued that any exten- 
sion satisfying ( |25a| ( and (25b} can be transformed to one satisfying 
l |25ct and l |25dt without increasing its variance. Therefore, if we 
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can establish that {25} have a unique solution (up to equivalence), 
the solution must have minimum variance. 

Geometrically, an extension / satisfies either J25t or d24t if and 
only if the function H(u) = f f(x,v)dx is the lower bound- 
ary of the convex hull of the lower bound function f^'(x) for 
x £ (0, p v ] and the point (p v , f(u, v)du). This implies exis- 
tence and uniqueness of the solution and also that it has the desired 
properties. 

We provide an algebraic proof. We start by exploring the struc- 
ture which an extension of /(it, v) to u £ (0, p v ], which satisfies 
( 124ft . must have. We subsequently use the structure to establish ex- 
istence and nonnegativity, uniqueness (a.e.), that it satisfies {25}, 
and lastly, to show that any function that violates {24} must also 
violate (25} . 

Structure: We say that p is an LB {lower bound) point of f(u, v) 
if the infimum of the solution of {24} at p is achieved as r\ — > p~ , 
that is, 



f(p,v) 



lim 



Ip f{u,v)du 



Otherwise, if the infimum is approached when r\ < p — e for some 
e > 0, we say that p is a gap point. Note that it is possible for p 
to be both an LB and a gap point if the infimum is approached at 
multiple places. If 



/ f(u, v)du < lim 
J P i->p _ 



f iv) (v) = r>(p), 



then p must be a gap point, and if equality holds, p can be either 
an LB or a gap point (or both). The equality follows using Corol- 
laiy|4T](left continuity of f^). 

We use this classification of points to partition (0, p v ] into LB 
and gap subintervals of the form (y, p], that is, open to the left and 
closed to the right. 

LB subintervals are maximal subintervals containing exclusively 
LB points (which can double as gap points) that have the form 
(y, p\. From left continuity and monotonicity of f^°\ if p is an 
LB point and not a gap point then there is some e > such that 
(x — e, x] are also LB points. Thus all LB points that are not gap 
points must be part of LB subintervals and these subintervals are 
open to the left and closed to the right (the point y is a gap point 
which may double as an LB point). Alternatively, we can identify 
LB subintervals as maximal such that 

Vas6(w,p], [ Hu,v)du = f (v) (x). 

J X 

Gap subintervals are maximal that satisfy 

Vx€(y,p), [ f(u,v)du< lim . (26) 



Note that a consecutive interval of gap points may consist of mul- 
tiple back-to-back gap subintervals. 

We can verify that the boundary points b £ {y, p} of both LB 
and gap subintervals satisfy 



r 



f{u,v)du= lim f (v) (x) 



(27) 



Visually, LB intervals are segments where f f(x, v)dx "identi- 
fies" with the lower bound function whereas gap intervals are linear 
segments where it "skips" between two points where it touches (in 
the sense of limit from the right) the (left-continuous) lower bound 
function. Figure[3]illustrates a partition of an example lower bound 



function and its lower hull into gap and LB subintervals. From left 
to right, there are two gap subintervals, an LB subinterval, a gap 
subinterval, and finally a trivial LB subinterval (where the LB and 
estimates are 0). 

Existence, nonnegativity, and {25a} : 

For some < p < p v , let f(u, v) be an extension which satisfies 
{24} and {25a} for all u £ (p, p v ], We show that there exists y < p 
such that the solution can be extended to (y,p\, that is, {24} and 
{25a} are satisfied also for u £ (y, p], 

Consider the solution of {24} at u = p. If the infimum is attained 
at r\ < p, let y be the infimum over points r\ that attain the infimum 
of {24} at p. We can extend the solution of {24} to the interval (y, p], 
which is a gap interval (or the prefix of one). The solution is fixed 
throughout the interval: 

lim _f W (j7) - lim / W 0?) 
V* £ (y, p], f(x, v) = ™± 2=*1 . (28) 

p-v 

Assume now that the infimum of the solution of {24} at p is at- 
tained as 77 — 5* p~ and not at any 77 < p. Let y be the supremum of 
points x < p 



. /W(r7)-lim u _ 
ini = 

0<r)<x X — 5 



/<•>(«) / ( " ) (i7)-lK,-v,+ / W («) 

— < 11m = = 

r)->a;- x — rj 

(29) 



(We allow the rhs to be +00). We have y < p and can extend the 
solution to (y, p\. The extension satisfies f f(u, v) du = f^Hx) 
and (y, p] is an LB subinterval or a prefix of one. The actual esti- 
mator values on (y, p] are the negated left derivative of /'"'(u): 



£ (y,p],f(x,v) = lim 



/ w (i)-/ w W 



In both cases (where the extension corresponds to an LB or gap 
subinterval), the solution on (y, p] exists, is nonnegative, and satis- 
fies {25a} for u £ (y, p]. 

Thus, starting from u — p v , we can iterate the process and com- 
pute a solution on a suffix of (0, p v ] that is partitioned into gap and 
LB intervals. The sum of sizes of intervals, however, may converge 
to a value < p v and thus the process may not converge to covering 
(0, p v ]. To establish existence, assume to the contrary that there is 
no solution that covers (0, p v ]. Let x be the infimum such that there 
are solutions on (x, p v ] but there are no solutions for (x — e, p v ] for 
all e > 0. Let f(u, v) be a solution on u £ (a;, p v ] and consider 
its partition to LB and gap intervals. Each interval that intersects 
[a;, x + e) must contain a left boundary point y t of some subin- 
terval (it is possible that y E = x). All boundary points satisfy 
JyJ&v) <l (v) (y t ). We have 

/ ,f(u, v)du = lim f f(x, v) < lim („ e ) 

J x J y e 



= lim . / w (ij)</ w (aO 

77— > x< 

The last inequality follows from monotonicity of f^(u). Thus, 
we can apply our extension process starting from p — x and extend 
the solution to (y, x] for some y < x, contradicting our assumption 
and establishing existence. 

Uniqueness: Assume there are two different solutions and let x £ 
(0, p v ) be the supremum of values on which they differ. Consider 
one of the solutions, x can not be an interior point of a (gap or LB) 
subinterval because the estimator is determined on the subinterval 
from values on the right boundary, which is the same for both so- 
lutions. If a; is a right boundary point, j^ 1 ' f(u, v)du is same for 
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both functions and thus the left boundary and the solution on the 
interval are uniquely determined and must be the same for both 
functions, contradicting our assumption. 

Satisifes d25bb (unbiasedness): From ((5} (a sufficient necessary 
condition to existence of unbiased nonnegative estimator) and ( |27| ( 



f(u, v)du 



lim / 

u->0+ 



(«) = /(«)■ 



Satisifes < |25cb and l |25d| i: Within an LB subinterval the estimator 
must be nonincreasing, since otherwise we obtain a contradiction to 
the definition of an LB point. The fixed values on two consecutive 
gap intervals (y, x] and (a;, p] must also be nonincreasing because 
otherwise the infimum of the solution at p could not be obtained at 
x since a strictly lower value is obtained at y. 
The only solution of f25l >: Let p v ' (u) = f(u, v) where / is a 
solution of \2A\ . Let /' be another extension which satisfies d25a| l 
and l |25bt (and thus constitutes a partially specified estimator that is 
fully specified on v) but is not equivalent to p v ', that is, for some 
P 

f {, "\u)du ± f V f'(u,v)du . 

We show that /' must violate ( |25ct or d25d> . 
We first show that if 

f M (u)du< f" V f'(u,v)du (30) 
i p J p 

then monotonicity < |25cb must be violated. Let p be the supremum 
of points satisfying ( 130b . Then p must be a gap point and some 
neighborhood to its left must be gap points. Let y < ~p be the left 
boundary point of this gap interval. Recall that p v > is constant on 
a gap interval. 

Let p 6 (y, p) be such that d!30t holds. There must be measurable 
set in [p, p] on which /' > . Since y is a left boundary of a gap 
interval, from J27K 



f {v) {u)du= lim f w (u) 



(«)/ 



and therefore, since /' satisfies (|25al(, J Pv f'du < J Pv f (v) (u)du. 
Thus, there must exist a measurable subset of [y, p] on which /' < 
/W. Since / (,,) is constant in (y,p], /' violates J25ct . 
Lastly, we show that if 



f {v \u)du > / f'(u,v)du 



(3D 



then even if J25cb holds then d25dl > is violated. 

There must exist a measurable subset U C [p, p„] on which 

fW(u)>f'(u,v). 

Let 77 be the supremum of points x £ [0, p) satisfying J Pv f'(u, v)du 



lim,, 



/ 



(«)/ 



Since this is satisfied by x = and the supre- 



mum can not be satisfied by p, such 77 < p is well defined. 

Since /» satisfies f25a), f (v) {u)du < J Pv f'{u,v)du. 
Therefore, there must be another measurable set W C [rj, p] on 
which f( v '(u) < /'(it, v). Because /''"' is monotone, the values 
of /' on W are strictly larger than on U. Hence, /' violates J25dt . 
□ 

Theorem l4. 1 I follows by applying Theorem l7. 1 I to an empty spec- 
ification. 



Conclusion 

We developed a precise understanding of which queries we can es- 
timate accurately over coordinated samples, defined variance com- 
petitiveness, and showed that it is generally attainable. Our work 
uses a fresh, CS-inspired, and unified approach to the study of esti- 
mators that is particularly suitable for data analysis from samples. 

Looking forward, we plan to study the interaction of competi- 
tiveness and variance optimality (an estimator is variance optimal 
of it can not be strictly improved) and also explore deriving cus- 
tomized estimators to common patterns in the data. Beyond coor- 
dination, we hope to better understand independent sampling 1141 

ED. 

On the applied front, our work is directly motivated by the preva- 
lent use of sampling as synopsis of large data sets. We demonstrate 
its potential for difference queries in a companion experimental pa- 
per fT6l . In the longer run, we envision automated tools that pro- 
vide estimates according to specifications of the sampling scheme, 
query, competitive ratio, and prioritization of patterns in the data. 
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